Updating the partial singular value decomposition in latent semantic indexing

نویسندگان

  • Jane Tougas
  • Raymond J. Spiteri
چکیده

Latent semantic indexing (LSI) is a method of information retrieval that relies heavily on the partial singular value decomposition (PSVD) of the term-document matrix representation of a dataset. Calculating the PSVD of large term-document matrices is computationally expensive; hence in the case where terms or documents are merely added to an existing dataset, it is extremely beneficial to update the previously calculated PSVD to reflect the changes. It is shown how updating can be used in LSI to significantly reduce the computational cost of finding the PSVD without significantly impacting performance. Moreover, it is shown how the computational cost can be reduced further, again without impacting performance, through a combination of updating and folding-in.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matrices with Low-Rank-Plus-Shift Structure: Partial SVD and Latent Semantic Indexing

We present a detailed analysis of matrices satisfying the so-called low-mnk-plus-shift property in connection with the computation of their partial singular value decomposition. The application we have in mind is Latent Semantic Indexing for information retrieval where the termdocument matrices generated from a text corpus approximately satisfy this property. The analysis is motivated by develo...

متن کامل

Using Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis

We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. We compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. In this study we use a corpus comprising 1003 documents from the MEDLINE-corpus. Our results show that Latent Semantic Analysis on Random Index...

متن کامل

Two Uses for Updating the Partial Singular Value Decomposition in Latent Semantic Indexing

Latent Semantic Indexing (LSI) is an information retrieval (IR) method that connects IR with numerical linear algebra by representing a dataset as a term-document matrix. Because of the tremendous size of modern databases, such matrices can be very large. The partial singular value decomposition (PSVD) is a matrix factorization that captures the salient features of a matrix, while using much le...

متن کامل

Parallel Svd Computation in Updating Problems of Latent Semantic Indexing ∗

In latent semantic indexing, the addition of documents (or the addition of terms) to some already processed text collection leads to the updating of the best rank-k approximation of the term-document matrix. The computationally most intensive task in this updating is the computation of the singular value decomposition (SVD) of certain square matrix, which is upper or lower triangular, and conta...

متن کامل

EDLSI with PSVD Updating

This paper describes the results obtained from the merging of two techniques that provide improvements to search and retrieval using Latent Semantic Indexing (LSI): Essential Dimensions of LSI (EDLSI) and partial singular value decomposition (PSVD) updating. EDLSI utilizes an implementation of LSI that requires the use of only a few dimensions in the LSI space. The PSVD updating and folding-up ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2007